Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

نویسندگان

چکیده

This paper looks at the impact of changing Spark’s configuration parameters on machine learning algorithms using a large dataset—the UNSW-NB15 dataset. The environmental conditions that will optimize classification process are studied. To build smart intrusion detection systems, deep understanding is necessary. Specifically, focus following parameters: executor memory, number executors, cores per executor, execution time, as well statistical measures. Hence, objective was to resource usage and minimize processing time for Decision Tree classification, Spark. shows whether additional resources increase performance, lower computing resources. dataset, being provides enough data complexity see changes in configurations Principal Component Analysis used preprocessing Results indicated lack executors result wasted long time. Excessive allocation did not improve Environmental tuning has noticeable impact.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Data Mining and Three Decision Tree Algorithms to Optimize the Repair and Maintenance Process

The purpose of this research is to predict the failure of devices using a data mining tool. For this purpose, at the outset, an appropriate database consists of 392 records of ongoing failures in a pharmaceutical company in 1394, in the next step, by analyzing 9 characteristics and type of failure as a database class, analyzes have been used. In this regard, three decision tree algorithms have ...

متن کامل

CourboSpark: Decision Tree for Time-series on Spark

With the deployment of smart meters across many countries, data are being collected at a large scale and volume. These data are collected for billing purposes but also to get analytical insights. Our main goal here is to build an understandable model able to explain the electric consumption patterns regarding several features. We chose to use decision tree models as they are easily comprehensib...

متن کامل

Research of Decision Tree on YARN Using MapReduce and Spark

Decision tree is one of the most widely used classification methods. For massive data processing, MapReduce is a good choice. Whereas, MapReduce is not suitable for iterative algorithms. The programming model of Spark is proposed as a memory-based framework that is fit for iterative algorithms and interactive data mining. In this paper, C4.5 is implemented on both MapReduce and Spark. The resul...

متن کامل

Predicting Twist Condition by Bayesian Classification and Decision Tree Techniques

Railway infrastructures are among the most important national assets of countries. Most of the annual budget of infrastructure managers are spent on repairing, improving and maintaining railways. The best repair method should consider all economic and technical aspects of the problem. In recent years, data analysis of maintenance records has contributed significantly for minimizing the costs. B...

متن کامل

Decision Tree Classification on Outsourced Data

This paper proposes a client-server decision tree learning method for outsourced private data. The privacy model is anatomization/fragmentation: the server sees data values, but the link between sensitive and identifying information is encrypted with a key known only to clients. Clients have limited processing and storage capability. Both sensitive and identifying information thus are stored on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Big data and cognitive computing

سال: 2022

ISSN: ['2504-2289']

DOI: https://doi.org/10.3390/bdcc6020038